105 research outputs found

    Multi-Agent Simulation of Emergence of Schwa Deletion Pattern in Hindi

    Get PDF
    Recently, there has been a revival of interest in multi-agent simulation techniques for exploring the nature of language change. However, a lack of appropriate validation of simulation experiments against real language data often calls into question the general applicability of these methods in modeling realistic language change. We try to address this issue here by making an attempt to model the phenomenon of schwa deletion in Hindi through a multi-agent simulation framework. The pattern of Hindi schwa deletion and its diachronic nature are well studied, not only out of general linguistic inquiry, but also to facilitate Hindi grapheme-to-phoneme conversion, which is a preprocessing step to text-to-speech synthesis. We show that under certain conditions, the schwa deletion pattern observed in modern Hindi emerges in the system from an initial state of no deletion. The simulation framework described in this work can be extended to model other phonological changes as well.Language Change, Linguistic Agent, Language Game, Multi-Agent Simulation, Schwa Deletion

    LoNLI: An Extensible Framework for Testing Diverse Logical Reasoning Capabilities for NLI

    Full text link
    Natural Language Inference (NLI) is considered a representative task to test natural language understanding (NLU). In this work, we propose an extensible framework to collectively yet categorically test diverse Logical reasoning capabilities required for NLI (and by extension, NLU). Motivated by behavioral testing, we create a semi-synthetic large test-bench (363 templates, 363k examples) and an associated framework that offers following utilities: 1) individually test and analyze reasoning capabilities along 17 reasoning dimensions (including pragmatic reasoning), 2) design experiments to study cross-capability information content (leave one out or bring one in); and 3) the synthetic nature enable us to control for artifacts and biases. The inherited power of automated test case instantiation from free-form natural language templates (using CheckList), and a well-defined taxonomy of capabilities enable us to extend to (cognitively) harder test cases while varying the complexity of natural language. Through our analysis of state-of-the-art NLI systems, we observe that our benchmark is indeed hard (and non-trivial even with training on additional resources). Some capabilities stand out as harder. Further fine-grained analysis and fine-tuning experiments reveal more insights about these capabilities and the models -- supporting and extending previous observations. Towards the end we also perform an user-study, to investigate whether behavioral information can be utilised to generalize much better for some models compared to others.Comment: arXiv admin note: substantial text overlap with arXiv:2107.0722

    LLM-powered Data Augmentation for Enhanced Cross-lingual Performance

    Full text link
    This paper explores the potential of leveraging Large Language Models (LLMs) for data augmentation in multilingual commonsense reasoning datasets where the available training data is extremely limited. To achieve this, we utilise several LLMs, namely Dolly-v2, StableVicuna, ChatGPT, and GPT-4, to augment three datasets: XCOPA, XWinograd, and XStoryCloze. Subsequently, we evaluate the effectiveness of fine-tuning smaller multilingual models, mBERT and XLMR, using the synthesised data. We compare the performance of training with data generated in English and target languages, as well as translated English-generated data, revealing the overall advantages of incorporating data generated by LLMs, e.g. a notable 13.4 accuracy score improvement for the best case. Furthermore, we conduct a human evaluation by asking native speakers to assess the naturalness and logical coherence of the generated examples across different languages. The results of the evaluation indicate that LLMs such as ChatGPT and GPT-4 excel at producing natural and coherent text in most languages, however, they struggle to generate meaningful text in certain languages like Tamil. We also observe that ChatGPT falls short in generating plausible alternatives compared to the original dataset, whereas examples from GPT-4 exhibit competitive logical consistency.Comment: EMNLP 2023 Main Conferenc
    • …
    corecore